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DISCREPANCY, CHAINING AND SUBGAUSSIAN PROCESSES 

By Shahar Mendelson 

Israel Institute of Technology and The Australian National University 

We show that for a typical coordinate projection of a subgaussian 
class of functions, the infimum over signs inf (g.j s'^ip f^pl'l^i^i 
is asymptotically smaller than the expectation over signs as a func- 
tion of the dimension k, if the canonical Gaussian process indexed by 
F is continuous. To that end, we establish a bound on the discrep- 
ancy of an arbitrary subset of R'' using properties of the canonical 
Gaussian process the set indexes, and then obtain quantitative struc- 
tural information on a typical coordinate projection of a subgaussian 
class. 



1. Introduction. The geometric structure of a random coordinate pro- 
jection of a class of functions plays an important role in Empirical Pro- 
cesses theory, where it is used to determine whether the uniform law of 
large numbers or the uniform central limit theorem is satisfied by the given 
class. Indeed, if F is a class of functions on a probability space and 
if cr = {Xi, . . . ,Xk) is an independent sample distributed according to /i'^, 
then the "complexity" of the random set 

P^F = {(/(Xi), . . . , f{Xk)) : / E F} C M*^ 

is the key parameter in addressing both these questions. In this context, if 
(ei)jL;^ are independent, symmetric, {— 1, l}-valued random variables, then 
the complexity is governed by the expectation of the supremum of the 
Bernoulli process indexed by P^F, defined by 



(1.1) 



Eg sup 

/6F 



^eJiXi 



i=l 



e sup 



E 



and in particular, on the way this expectation grows as a function of k for a 
typical sample of cardinality k (see, e.g., [3, 8, 19] and references therein). 
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The structure of such coordinate projections is central to questions in 
Asymptotic Geometric Analysis as well. For example, let K C M"^ be a con- 
vex, symmetric set (i.e., if x G if then —x € K) and put F = {(x, •) : x € K} 
to be the class of linear functionals indexed by K. If is a measure on R'^, 
then P(jF is the set TK^ where T is the random operator F = ■)ei. 
Fundamental questions on the geometry of convex, symmetric sets, such as 
Dvoretzky's theorem and low-M* estimates have been answered by obtain- 
ing accurate, quantitative information on the structure of such coordinate 
projections, and by using very similar complexity parameters to (1.1) (e.g., 
[8, 15]). 

For both these reasons, a lot of effort has been invested in understanding 
various notions of complexity for a typical coordinate projection of a class 
of functions. A well studied direction is to obtain quantitative estimates on 
the way in which (1.1) is related to two other complexity parameters, the 
combinatorial dimension and covering numbers. 

Roughly speaking, the combinatorial dimension of T C M'^ at scale e, 
denoted by VC(T, e), is the largest dimension of a coordinate projection of T 
that contains a "cube" of scale e (see Definition 6.2 for an exact formulation). 
If (T, d) is a metric space then the covering number at scale e, which we 
denote by N[e, T, d), is the smallest cardinality of a subset {yi, . . . , y-m} C T 
such that for every t gT, there is some yi for which d{t,yi) < e. 

Connections between (1.1) and the combinatorial dimension or the cov- 
ering numbers of P^jF are rather well understood. For example, a straight- 
forward chaining argument (see, e.g., [19]) shows that for every T cM'^, 



(1.2) E^sup 



■1=1 



/diam(r) , 
^\ogN{e,T4l)de, 



where is the Euclidean metric on M'^, diam(r) is the diameter with re- 
spect to the same metric and c is an absolute constant, independent of the 
dimension k and of the set T. Some of the other relations between these pa- 
rameters are far more involved. First, controlling the L2 covering numbers 
using the combinatorial dimension was resolved in [12], where it was shown 
that if T is a subset of the unit cube and /i is any probability measure 
on {1, ... , k], then for every < e < 1, 

X-VC{T,ce) 

N{e,T,L2{fJi))<r- ' 

where K and c are absolute constants. Also, the solution of the sign embed- 
ding of l\ problem (see [12] ) implies that if T C and E sup^gj^ | X^^Li ^iti \ > 
5k, then VC(T, ci5) > C26'^k. In other words, under a normalization condi- 
tion (TcB^), the only reason that Esup^g'pl^j^;^ ejijl is almost extremal 
is that T contains a high-dimensional cubic structure. 
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In this article, we study a related geometric parameter — the discrepancy 
of a typical coordinate projection. Discrepancy was introduced in a combi- 
natorial context (see the book [11] for an extensive survey on this topic) and 
is defined as follows. 

Definition 1.1. If T c M^, then the discrepancy of T is 

disc(r) = inf sup 

tax 

and the infimum is taken with respect to all signs {si)^^-^ G l}*^- 
We denote by Hdisc(r) the hereditary discrepancy of T, given by 

sup disc(P7T), 
/c{i,...,fc} 

where PjT = {(ij)jg/ : t G T} is the coordinate projection of T onto /. 

Observe that if absconv(T) is the convex hull of TU — T, then disc(T) = 
disc(absconv(T)). Hence, from the geometric viewpoint, the discrepancy of 
T is proportional with a constant ^/k to the minimal width of absconv(T) 
in a direction of a vertex of the combinatorial cube {—1, l}'^. The hereditary 
discrepancy is governed by a similar minimal width, but of the "worst" 
coordinate projection of absconv(T). 

Our goal here is to study the discrepancy using the covering numbers 
and the combinatorial dimension of T, but we will focus on sets T that are 
random coordinate projections of a class of function F, which gives them 
more structure. A natural question in this context is to identify conditions on 
F under which there is a gap between disc(Po-F) and sup„gp^^|^*L]^ eiVi\ 
for a typical coordinate projection of F, as a function of the sample size k. To 
that end, we will develop dimension dependent bounds on the discrepancy 
of PaF (and in particular, bounds that are not asymptotic). 

Note that the metric structure of T C M*^ is not enough to determine its 
discrepancy. Indeed, if ei = (1, 0, . . . , 0) G M'^ and Ti = {0, ei} then disc(Ti) = 
1. On the other hand, if T2 = {0, which is linearly isometric to 

Ti, then disc(r2) < l/Vk. The reason for the large gap in the discrepancy 
between the two isometric sets is that T2 consists of a vector that is "well 
spread" while Ti consists of a "peaky" vector with respect to the underlying 
coordinate structure. In that sense, T2 is in a much better position than Ti . 
Note that in this example, Esup^g^^J^^Lj^ ejijl = Esup^g^^EiLi^i^il = 1 — 
and for the set T2, which is in a "good position" there is gap between the 
expectation for signs and the discrepancy. 

We will show that this is a general phenomenon: it is well known that 
Eg sup^g2"EiLi is determined by the Euclidean metric structure of T 



i=l 
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(up to a logarithmic factor in the dimension), and therefore, it is almost 
invariant under a linear isometry (i.e., a change in the coordinate structure). 
Thus, the expectation almost does not change when applying an isometry 
or a good isomorphism of £2- we will explain here, the situation with the 
discrepancy is rather different and the position of the set matters a great 
deal. Since the sets T that we will be interested in are not arbitrary but have 
some structure — as random coordinate projections of well behaved function 
classes, they will be much closer in nature to T2 than to Ti. 

Our main result is that if the canonical Gaussian process indexed by 
F C L2{^) is continuous and if the class satisfies a subgaussian condition 
[i.e., if the ^2(/(^) norm is equivalent to the -Z>2(a*) norm on F, see Definition 
2.4], then a typical coordinate projection of F behaves as a set of vectors in 
a "general position." As such, and just like the set T2, a typical coordinate 
projection exhibits certain shrinking properties that will be explained in 
Section 4, and which causes the discrepancy of such a set to be much smaller 
than the average over signs. 

Theorem A. Let F C L2{fJ,) be a class of mean zero functions. As- 
sume further that the canonical Gaussian process indexed by F is con- 
tinuous and that the L2{fJ.) and V'2(/u) norms are equivalent on F. Then 
Hdisc(Po--P) / ^/k -^0 in probability. 

To put Theorem A in the right perspective, observe that if the ■02 and 
L2 norms are equivalent on a class of mean zero functions F, then for every 
integer k there is a subset of O'^' of probability at least c on which 



(1.3) E^sup 



i=l 



> ClV^i 



CJ_p, 



where c depends only on the equivalence constant between the 1^2 and L2 
norms on F, ci is an absolute constant and ap = sup j^p(Ef'^y/'^ . Hence, 
there is a true gap between the discrepancy and the mean of a typical coor- 
dinate projection. 

Although the formulation of Theorem A is asymptotic, the result itself 
is quantitative in nature, as a function of the dimension of the coordinate 
projection. The proof of Theorem A is, in fact, a dimension dependent esti- 
mate on the sequences (afc,5)fcLi; which, with probability at least 1 — 6, 
Hdisc(Po-F) < ^/kok^s- We will show that the sequences (afc,(5)fcLi given 
using metric parameters that measure the continuity of the Gaussian process 
indexed by F — Talagrand's 72,5 functionals [18] . The 72,5 functional will be 
defined in Section 2, but for now let us mention that under mild measura- 
bility assumptions on the class, the canonical Gaussian process indexed by 
F is continuous if and only if lim5_j.oo ^2,s{F, L2{^)) = 0. 
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We will prove that for every < p < 1/2 and < 5 <1 there are constants 
c and C that depend on p, 6 and on the equivalence constant between the 
ip2{lj) and L2{^j) norms on F, such that for every 

T 72,iog2 iog2 cn (F, L2 ifi) ) • Vlog(e/c/n) + D 
(1.4) 

where D = diam(F, L2(/i)) is the diameter of F with respect to the i2(/u) 
norm. And, in particular, under the assumptions of Theorem A, for every 
< (5 < 1, limfc_j.oo afc,(5 = 0. Moreover, the proof of Theorem A actually shows 
that for every /c, with ^'^-probability of at least 1 — 5, 

disc(P,F) < C(Vfc72,iog2iog2cfe(i^,L2(/^)) + kPD), 

where C and c depend on 5 and the equivalence constant between the 
L2{li) and "02 (/^) norms on F. 

The proof of Theorem A is based on two ingredients. The first is a new 
estimate on the discrepancy of an arbitrary set T C M'^. It is a combina- 
tion of the entropy method, which is often used to control the combinatorial 
discrepancy (see, e.g., [1, 11, 16]), and Talagrand's generic chaining mech- 
anism [18], which was introduced to establish the connection between the 
72, s functionals and the continuity of Gaussian processes. The combination 
of these two methods will be explained in Section 3. It allows one to find 
a good choice of signs for roughly k/2 coordinates, while the error incurred 
by considering the sum taken only on these coordinates is determined by 
the 72, s functional for s ~ log2 log2 k. Repeating this argument, one obtains 
a bound on the discrepancy of T in terms of a sum of 72,5 functionals of 
coordinate projections of T and for values s that depend on the dimension 
of each projection, and those dimensions decrease quickly. 

The second component required for the proof of Theorem A is that the 
sets we are interested in are not general. We will obtain a structural result on 
a typical PaF that allows us to bound the 72,5 functionals of its coordinate 
projections using the L2{n) structure of F. 

Indeed, we will show that if the L2{ii) and ■02 (a*) norms are equivalent on 
F then a typical coordinate projection P^F has a rather regular structure — 
it is a subset of a Minkowski sum of two sets. The first one is small, with 
a bounded diameter in the weak ^2 space ^2oo- "^^^ other set is a subset of 
P(jF itself and can be viewed as a set of vectors in a "general position." In 
particular, further coordinate projections of the latter set shrink distances 
between any two of its elements. 

The structural result we obtain is of independent interest and can be used 
to derive information on the geometry of convex sets. For example, consider 
a symmetric probability measure ^ on M". We say that /i isotropic and L- 
subgaussian if a random vector X distributed according to /i satisfies that 
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for every x G M", 

E|(X,x)p = |2;|2 and \\{X,x)\\^.^ < L\x\. 

Simple examples of isotropic, L-subgaussian measures on are the Gaus- 
sian measure and the uniform measure on the vertices of the cube {—1, 1}", 
where in both cases L can be taken to be an absolute constant, independent 
of the dimension. 

Let (Xj)^^j^ be independent random vectors, distributed according to /i 
and consider the random operator F : R" — )■ M'^ defined by F = 



Corollary B. For any L> there are constants ci,C2 and C3 that 
depend only on L, for which the following holds. Let T C M" and set V = 
/j-i/2p2"_ T/ien, for every u> ci, with probability at least 1 — 2exp(— C2n), 
for every I C {1, . . . , k}, 



Eg sup 



< C3U 




Kg sup 

teT 



1=1 



where [gi) are independent, standard Gaussian random variables, and both 
expectations are taken with respect to those variables. 



Corollary B shows that the random operator F maps an arbitrary T to a 
set of vectors in a "general position" in a strong sense, since it implies that 
for most vectors in V , mutual distances are shrunk by any further coordinate 
projection. Let us note that we will prove a stronger result than Corollary 
B, namely that the 72,5 functionals associated with V display this type of 
shrinking phenomenon. 

The final result we present has to do with the reverse direction of Theo- 
rem A. Assume that H C L2{^) is a convex, symmetric set, which satisfies 
that the canonical Gaussian process {Gh ■ h S H} is bounded and that the 
L2(/i) and ip2ifJ') norms are equivalent on H. We will show that if the loga- 
rithm of the -f/2(/^) covering numbers of H grows like then for a typical 
sample a = {Xi, . . . ,Xk) selected according to /x^', 

YC{P^H,ci/Vk)>C2k. 

It is standard to verify (see Lemma 6.5) that if T C M'^, then 

Hdisc(r) > sup<5VC(absconv(T),(5). 
(5>0 

Therefore, if F is a class of mean-zero functions and H = absconv(F) sat- 
isfies the above, then lldisc{PcrF) > c\fk, complementing the upper bound 
established in Theorem A. 
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Although this is not exactly the reverse direction of Theorem A, it is very 
close to it. Indeed, if F C L2{fi) indexes a bounded Gaussian process then 
so does H = absconv(F), and the logarithm of the covering numbers of H 
cannot grow faster than 0(l/e^). On the other hand, if the log-covering 
numbers grow a little slower, even by a suitable logarithmic factor, then 
72,s(-F, L2(a*)) — ^ 0. In fact, this is as close as one can get to a covering 
numbers characterization of the fact that 72,5(^^,-^^2) (see, e.g., [3]). 

This result not only shows that Hdisc(Po-F) is large if the Gaussian pro- 
cess F indexes is bounded but not continuous, it also shows why. Under 
a boundedness assumption on the Gaussian process [which implies that 
IIdisc(P(j-F)/\/A; is bounded], the reason the hereditary discrepancy of PaF 
is extremal is because a typical coordinate projection of absconv(F) contains 
a high dimensional, large cubic structure, and that forces the hereditary dis- 
crepancy to be large. The proof of this result, which is presented in Section 
6, is based on the observation that if F is convex and symmetric then the 
richness of F at scale ~ 1/Vk is exhibited by the existence of cubes of scale 
~ 1/Vk and of dimension ~ /c in a typical coordinate projection of F of 
dimension k. It thus should be viewed as a "small scale" version of the Sign 
Embedding theorem which was mentioned above. 

Unfortunately, the optimal estimate in the Sign Embedding theorem can- 
not be used directly in our case, firstly because PaF is unlikely to be a subset 
of B^, and secondly, because a typical coordinate projection of F satisfies 
that 



E sup 



i=l 



Vk. 



Hence, the optimal estimate in the Sign Embedding theorem has to be used 
for 5 ~ and that only ensures that PaF contains a cube of scale 

~ and of constant dimension, which is far from what we need. 

The proof of the existence of a cube in PaF is based on two localization 
arguments, one with respect to the L2 norm and the other with respect to 
the Loo norm. The first localization shows that if the L2{fJ.) covering number 
of F at scale ~ 1 /^/k is of the order of exp(ciA;) then the richness of a typical 
coordinate projection of F of dimension ~ k originates from the set 

(1.5) Fi = Fn-^i3(L2(/z)), 

that is, functions in F of L2(/i) norm at most 0{l/^/k). In the second 
localization, one shows that the complexity of a typical coordinate projection 
actually comes from a further pointwise truncation of the functions in F, 
and B[L2{i-l)) in (1.5) can essentially be replaced by i?(Loo(/i)) — the unit 
ball in Loo(^)- 
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This article is organized as follows. In Section 2, we present further pre- 
liminaries, most of them concerning subgaussian variables and the 72,5 func- 
tional. In Section 3, we develop bounds on the discrepancy of an arbitrary 
subset of M*^. Section 4 is devoted to the proof of the structural results on 
coordinate projections of subgaussian processes and its corollaries, includ- 
ing Corollary B. Theorem A is proved in Section 5 and its converse and the 
resulting lower bound on the hereditary discrepancy of a typical coordinate 
projection is proved in Section 6. 

2. Preliminaries. Throughout, absolute constants (i.e., fixed, positive 
numbers) will be denoted by C, c, ci etc. Their values may change from line 
to line. We denote by C{a),c{a) constants that depend only on the parame- 
ter a and we set ki, K2, . . . to be constants that will remain fixed throughout 
the article. By b, we mean that there are constants c and C such that 
ca<b< Ca, and we write 6 < a if 6 < Ca. 

We will consider a single, fixed Euclidean structure on all finite-dimensional 
spaces and denote the corresponding Euclidean norms by | • | without 
specifying the dimension. With a minor abuse of notation, the cardinality 
of a set and the absolute value are denoted in the same way. 

If is a normed space, let B{E) be its unit ball, and for E = ip we denote 
the unit bah by B^^ . If a = (Xi, . . . , Xk) G 9.^ let = k'^ ELi ^x, be the 
empirical measure supported on cr, set to be the corresponding L2 space, 
and for / C {1, . . . , A;} let be the coordinate subspace of £2 spanned by 



The situation we will study here is as follows. Let F be a class of real 
valued functions on a probability space {Q, fj,), let Xi, . . . , Xk be independent 
random variables distributed according to fj, and set a = {Xi, . . . ,Xk). Let 
PaF = {(/(X/))f^^ : / G F} C M'' be the coordinate projection of F defined 
by a and for every / C {1, . . . , A:} let P[F = {{f{Xi))i^i : f e F} C M'^I be the 
coordinate projection of F on the subset of coordinates (Xj)jg/. Sometimes, 
for the sake of simplicity, we shall omit the superscript a. 

2.1. Subgaussian processes. Here, we will describe properties of sums of 
independent random variables that have quickly decaying tails. 

Definition 2.1. Let / be a functions defined on a probability space 
(r2,/u). For 1 < a < 2, define the a-Orlicz norm by 



For basic facts regarding Orlicz norms, we refer the reader to [2, 19]. 



(ej)j6/- 
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It is well known that a random variable has a bounded norm for 
1 < a < 2 if and only if it has a well behaved tail; that is, there is an absolute 
constant c such that for every / S L^^ and every t>l, 



Conversely, there is an absolute constant ci such that if / displays a tail 
behavior dominated by exp(— for 1 < q < 2 then ||/||^^ < ciK. 

There are several basic properties of sums of independent random vari- 
ables we require. The proofs of these facts can be found, for example, in 



Note that if / has a subexponential tail then its empirical means con- 
centrate around its true mean, with a tail behavior that is a mixture of 
subgaussian and subexponential. Indeed, the following result is a version of 
Bernstein's inequality and shows just that. 

Theorem 2.2. There exists an absolute constant c for which the follow- 
ing holds. Let (17, fi) be a probability space and set f :Q to be a function 
with a bounded ipi norm. If Xi,. . . , are independent and distributed ac- 
cording to /U then for every t > 0, 



If a function has mean zero and a bounded "02 norm, one can obtain a 
purely subgaussian tail. 

Lemma 2.3. There exists an absolute constant c for which the following 
holds. Let Y'l,...,!^ be independent random variables of mean zero. Then, 
for every oi , . . . , G M, 



In particular, if {Xi)^^-^ are independent random variables distributed ac- 
cording to /U and f has zero mean, then for every t>l, 



Pr(|/|>t)<2exp(-ci7ll/ll?J- 



[2, 8, 19]. 



Pr - 





\ i=l 

where ci is an absolute constant. 




In what follows, we will assume that the ^2 and L2 norms are equivalent 
on F in the following sense. 
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Definition 2.4. A set F c L2(^) is L-subgaussian if ||/||^2 < -^^ II /Ilia 
and II/- c/||^2 < -^^11/ -5||l2 for every f,geF. 

Next, let us turn to the definition of the 72,5 functionals [18]. Let {T,d) 
be a metric space. An admissible sequence of T is a sequence of subsets of 
T, {Ts}%o, such that ITqI = 1 and for every s > 1, |r^| < 2^' . 

Definition 2.5. For a metric space {T,d) and an integer sq > 0, let 

00 

j2,soiT,d) =mfsnpY,r/^d{t,Ts), 

where the infimum is taken with respect to all admissible sequences of T. 
Set 72 (T,(i) =j2,oiT,d). 

Let TTs :r — 7- Tg be a metric projection function onto Tg, that is, 7rg(i) is 
a nearest point to t in Tg with respect to the metric d. It is easy to verify 
that for every admissible sequence, every t gT, and any sq > 0, 

00 00 
J2 r/^d{7rs+i{t),7rs{t)) < (1 + I/V2) r/^d{t,Ts) 

s=so s=so 

and that the diameter of T satisfies diam(T, d) < 272(T, (i). Moreover, it is 
clear that the 72, s functionals are decreasing in s and are subadditive in 
T in the following sense. Let X be a normed space and consider two sets 

A, B C X. li A + B = {a + b : a £ A,b € B} is the Minkowski sum of A and 

B, then for every integer s, 

72,.+i(A + 5,(f) <72,g(^,(i) +72,s(5,d). 

There is a close connection between the 72,5 functionals with respect to L2 
norms and properties of Gaussian processes (see [3, 18] for expositions on 
these connections). Indeed, let {Gt :t £T} be a centered Gaussian process 
indexed by a set T and for every s,t£T define a metric on T by d'^{s,t) = 
K\Gs — Gt\'^. One can show that under mild measur ability assumptions on 

C172 (T, d)<E sup Gt < C272 (T, d) , 

where ci and C2 are absolute constants. The upper bound is due to Fernique 
[4] and the lower bound is Talagrand's Majorizing Measures theorem [17]. 
The proof of both parts can be found in [18]. Thus, the 72 functional is finite 
if and only if the Gaussian process indexed by T is bounded. 
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Note that if T C M" and Gt = J27=i 9i^i then d{u, t) = \u — t\ and therefore 



Just hke 72(T, L2(/i)) determines the supremum of the canonical Gaus- 
sian process indexed by T C L2{fi) (which we wih always assume to satisfy 
the necessary measurabihty assumptions), the continuity of that process is 
determined by properties of the sequence 72,5. 

Definition 2.6. Let F C L2{n) be a class of mean zero functions. Set 
{Gf : / G F} to be the centered Gaussian process indexed by F with a co- 
variance structure endowed by L2(/i); that is, for every f,gGF, 'KGfGg = 
(/i5')l2(m)' that F is ^u-pregaussian if it has a version with ah sam- 

ple functions bounded and uniformly continuous with respect to the L2{^j) 
metric. 

Theorem 2.7 [17, 18]. Let {Gt:t£T} be a centered Gaussian process 
and endow T with the L2 metric given by the covariance structure of the 
process, as above. Under measurabihty assumptions, the following are equiv- 
alent: 

1. The map t^Gt{uj) is uniformly continuous on T with probability 1. 

2. lim5^oEsup^(„t)<5|Gn-G'i| =0. 

3. There exists an admissible sequence of T such that 



In other words, T is pregaussian if and only if lims_!.oo 72,s(F) -^2) = 0. 

Remark 2.8. Theorem 2.7 is not proved in [18] but only stated there, 
and its formulation in [17] was done using the notion of majorizing measures 
rather than with the 72,5 functionals. Since the proof of the continuity the- 
orem follows from an application of the Majorizing Measures theorem and 
since the latter is proved in [18] using the language of the 72-functional, it 
is not difficult to convert the proof of the continuity theorem from [17] and 
obtain Theorem 2.7. Moreover, as shown in [17], there is a quantitative con- 
nection between the modulus of continuity of {Gt - t €T} and the sequence 
(72,s(r, L2))^o- Since we will not use this quantitative estimate here, we 
will not formulate it. 

Finally, let us define the covering and packing numbers of a metric space. 



n 



(2.1) 



ci72(r, I • I) < EsupN fifjij < C272(T, | • |). 



00 
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Definition 2.9. Let {T,d) be a metric space. The covering number of 
T at scale e > with respect to the metric d is the smallest number of open 
balls of radius e needed to cover T, and is denoted by N{e,T,d). 

We set ek{T,d) = mf{£ : N{e,T,d) < 2*^}. (efe)^Q are called the entropy 
numbers of T. 

A set AgT is called e-separated if the distance between any two of its 
elements is at least e. We denote by D{£,T,d) the cardinality of a maximal 
e-separated subset of T. 

It is standard to verify that for every e > 0, N{e,T,d) < D{e,T,d) < 
N{e/2, T,d), and thus one can use either one of the two notions freely. 

3. The discrepancy of subsets of M"'. We begin this section with a tech- 
nical lemma which is at the heart of the proof of Theorem A. The lemma 
allows one to find a good choice of signs on roughly half of the coordinates, 
while the error incurred by the choice of coordinates and signs can be con- 
trolled using the geometric structure of T. 

A preliminary result we need has to do with Bernoulli processes — the 
well-known Hoffding inequality (see, e.g., [8, 19]). 

Theorem 3.1. Let (si)^^-^ be independent, symmetric, {—l,l}-valued 
random variables. Then, for every a G and every t>0, 



In particular, 



Pr I ^ EiQi >t\a\] < exp(-tV2)- 



Pr 



i=l 



SiCLj 



> t\a\ < 2exp(-iV2)- 



Let us formulate the main lemma. 



Lemma 3.2. Let 

flog(eA), i/0<i<l, 
^' \texp(-t + l), ift>l. 

There exist absolute constants ki and K2 for which the following holds. As- 
sume that {Xs)'^i is an increasing positive sequence tending to infinity, 
{Qs)^i is a positive sequence and n is an integer such that 



n 

Too' 



Let T C M" for which G T, set {Ts)^i to be a sequence of subsets ofT and 
let Tq = {0}. Consider maps vTs :T— )• that satisfy that: 
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(a) for every s>l, |{vrs(t) - iTs-i{t):te T}\ < Xg, 

(b) for every t £T, linis^oc'^sit) = t. 

Then, there exists {'r]i)f^^ G { — 1,0,1}" such that n/4 < {{i'-rji = 0}| < 
3n/4, and for every t £T, 



i=l 



<^Q,|7r,(i) -7r,_i(t)|. 



s=l 



The proof is a combination of a chaining argument and the entropy 
method, which is frequently used in Discrepancy Theory (see, e.g., [1, 10, 
16]). In the chaining mechanism, one takes the sets Tg to be finer and finer 
approximations of the set T and iTsit) is a nearest element to t in with 
respect to the underlying metric (which is, in our case, the ^2 metric). 

Recall that the entropy of a discrete random variable X taking values in 
a countable set 17 is 

H{X) = -Ypujlog2Puj, 

where = Fr{X = to). The entropy function H(X) indicates how close X 
is to being equally distributed; the more equally distributed X is, the larger 
H{X) is. 

The three facts we will need regarding the entropy are well known and 
we omit their proofs. First, if H{X) < K then there is a value of X that 
is attained with probability at least . Second, if X attains at most k 
values then H{X) < log2 k, and finally, if X = {Xi, . . . ,Xm) is a random 
vector then H{X) < ^™ ^ H{Xi). 

In the entropic argument we will use, each "link" in each chain in T is 
assigned a random variable X^ ■ { — 1, l}" — >• ^ that depends on the link and 
on the chain. The idea is to show that with probability at least 2~^'^, for 
every a, each random variable X^ falls in an interval whose length is at 
most <3o||Xa||L2- One would like to make these scaling factors Qa as small 
as possible while still ensuring that conditions 1 and 2 hold, since those 
conditions imply that the intersection of the level sets of all the random 
variables X^ has the desired measure. 

More details on the way entropic arguments have been used in the context 
of Discrepancy Theory may be found in [1, 11]. 

Before presenting the proof, one should mention that a chaining argument 
was implicit in Matousek's result on the discrepancy of a subset of {0, 1}" 
with a bounded VC dimension [10, 11]. 

The first step in the proof of Lemma 3.2 is the following entropy estimate. 
We denote by [x] the integer value of x. 
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Lemma 3.3. There exists an absolute constant c for which the following 
holds. Let a G M", set Za = X]"=i and put 

Wa = Sgn{Za)[\Za\]. 

Then 



J2 Pr(TK = i)logPr(VF„ = i)<c$(l/2| 



Proof. By Hoffding's inequality (Theorem 3.1), for every j G Z \ {0}, 



p, = Pr{Wa=j) < PriZa >\j\)< exp(-iV2k 



Also, 



PO = PliWa = 0) = Pr(-1 <Za< 1), 



implying that 



1 - Po = Pr 



2=1 



> 1 < 2exp(-l/2|a|2). 



Consider j G Z for which \j\ > ^/2\a\. Since f{x) = — xlogx is increasing in 
[0, 1/e], it follows that for such values of j, 

-Pjlogpj < exp(-jV2|ap). 

Fix an integer k which satisfies that k > \/2|a| and which will be named 
later, and observe that if we set S = J2i<\j\<kPj then 

- E Pj'^^m<- E ^^og{S/2k) = Slogi2k/S). 



i<lil<fc 



i<lil<fc 



Clearly, S<l-po < 2exp(-l/2|a|2), and thus, if exp(-l/2|op) < 1/e (i.e., 
if V2\a\ < 1), then 

Slogi2k/S) = logk + 2(5/2) log(2/5) < log + exp(-l/2|o|2). 
Otherwise, Slog{2k/S) < log k + ^ log(2e) < 1 + log k, implying that 



E PjlogPi <logA;+ 2|a|2 



exp(-l/2|ap), if^/2|a|<l. 



i<b1<fc 



otherwise. 
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Moreover, 

\j\>k+l j>k+l ' ' 

foo 2 

<2 -^exp(-xV2|ap) 

Jk -^W 

< {k + 2\a\)e^p{-k'^/2\af). 

Therefore, 

oo 

- J2 = -YPj log]5j - po logpo - X] 

i=oo \j\>k l<\j\<k 

< (A; + 2|a|)exp(-A;V2|ap) + 2exp(-l/2|ap) 

[ 1, otherwise. 

Now, consider the following three cases. First, if \/2|fl| < 1, take k = 1, and 
thus 

oo 

- X] PJ^°^PJ < exp(-l/2|ap). 

j=oo 

If 1 < \/2|a| < e set /e to be a suitable absolute constant and if -v/2|a| > e, 
put A; ~ |a| log(-v/2|o|)- Therefore, in both these cases 

oo 

- X] P^ < C2 log(\/2|a|), 

j=oo 

and our claim follows. □ 

Proof of Lemma 3.2. Without loss of generahty, assume that T is 
finite. Recall that € T and that Tq = {0}, consider the sets Tg and the maps 
TTg-.T^Ts, let As{t) = Trs{t)--Ks^i{t) and put = {Trs{t) --Ks^iit) -.t e T}. 
Let {Xs)'^i and {Qs)^i be as in the assumptions of the lemma and set 
{£i)^^i to be independent, symmetric, {— 1, l}-valued random variables. 

Consider the Bernoulli process Zt = Y17=i^i^i- Since Zt is linear in t 
and vro(t) = 0, then for every t £T, 



oo oo oo 

t = Y.^s{t) and Zt = Y,Z7rsit)- Z^s-lit)=Y.Z^^^ty 
s=l s=l s=l 
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For every s > 1 and u £ As define 



\u\Qs' 



Wu,s = sgn{Wu,s)[\Wu,s\]- 



Observe that {Wu,s)u£As,s=i,2,... is a vector that takes a finite number of 
values. Since the entropy is subadditive then 

oo 

H{{Wu,s) -.ue As, S = 1,2,... )<Y,Y1 H(Wn,s) = (*). 

S=l UGAs 

Suppose that one can find {Qs)'^i for which (*) < n/100. By the properties 
of the entropy, this imphes that there are numbers {lu,s S S As,s = 
1,2,.. .} such that 

(3.1) Pr((e,)ti : V^z G A„ s > 1, Wu,s = iu,s) = Pr(^) > 2^"/ioo. 

Since \A\ > 2°-^^", there wih be at least two vectors (ei)f=i ^-i^d (e9?=i ™ ^ 
that differ on at most 3n/4 coordinates and on at least n/4 of them. The 

e—e' 

desired sequence win then be {vi)f=i = (^V^)r=i- Indeed, for n G A,, 



i=l 



i=l 



1=1 



< 



\U\ 



implying that every t£T satisfies 



i=l 



s>l 1=1 



<Y^Qs\Asit)\. 

s=l 



Hence, to complete the proof, it remains to show that for a sequence 
(Qs)^i that satisfies the assumptions of the lemma, (*) < n/100. Applying 
Lemma 3.3 for a = u/\u\Qs, and since l/2|a|2 = Ql/2, it is evident that 
HiWu,s)<HQl/2). Thus, 



E H{Wu,s)<Y,\As\ sup HiWu,s)<Y.Xsm'm: 

s=l u£As s = l 

proving our claim. □ 



u£As 



= 1 



We will apply Lemma 3.2 in two typical situations. The first case will 
lead to a bound on the discrepancy of a set using the 72, « functionals of the 
set and of its coordinate projections. The second will result is an entropy 
integral type bound, presented in Section 3.1, which will then be used to 
re-prove Spencer's result on the discrepancy of a finite set system [11, 16] 
and Matousek's VC theorem [10, 11]. 

Corollary 3.4 below will play a central part in the proof of Theorem A. 
Since it follows from a simple computation, we omit its proof. 
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Corollary 3.4. There exist absolute constants k^, H4, K5 for which the 
following holds. Let T C i"^; assume that G T, set Sn = max{s : 2^ < K^n} 
and put Tg to be a collection of subsets of T with \Ts\ < "2^ . Then, if 

{exp(-K5n^/2), ifs<Sn, 
1, ifs = Sn, 

2'/^ ifs>sn, 

there exists {r]i)f^i G { — 1,0,1}"' such that n/4 < \{i:r]i = 0}| < 3n/4, and 
for every t £T, 



n 
i=l 



00 

<^Q,|7r,(i)-7r,_i(t)|, 

s=l 



where TTs{t) is a nearest point to t in Tg. 



3.1. An entropy integral argument. In this section, we will prove an 
analog of Dudley's entropy integral bound (see, e.g., [8, 18]) in the con- 
text of discrepancy. The entropy integral is often used to upper bound 
sup^gT^I^^^-^ ejtjl for a typical (ej)f=i, but here we will present a modified 
version that allows one to control infr; sup^g'pl^"^-^ where the infimum 
is taken with respect to all t] = (??j)f=i G { — 1,0,1}" for which roughly half 
the coordinates are nonzero. 

Let T C ^2 and recaU that for every e > 0, D{£) = D{e,T,i2) is the car- 
dinality of a maximal e-separated subset of T. Also, set 



U{£) 



'log 



exp 



eD{e) 



n 



n 



D{e) 



+ 1 



if D{£) > n, 
ifD(e)<n. 



Theorem 3.5. There exist an absolute constant c for which the follow- 
ing holds. If T C £2 o,nd £T, then there exist (?7j)"=i G {—1,0,1}"', such 
that n/4< \{i:rji = 0}| <3n/4 and for every t & T, 



(3.2) 



1=1 



>diam(T) 

< c / u{e) de. 

'0 



Remark 3.6. Recall that Dudley's entropy integral bound shows that 

fdiam(T) 



Esup 



1=1 



<Cl 



^/log D{e)de, 
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for a suitable absolute constant ci. Clearly, this entropy integral may be 
considerably larger than the quantity we have in Theorem 3.5. It is also 
evident that if one could iterate Theorem 3.5 for the set PjT C £2^, where 
I = {i:r]i = 0}, and continue in the same manner, then one would likely 
improve upon the bound resulting from the standard entropy integral bound 
that holds for a typical choice of signs, if indeed distances in PjT shrink 
relative to distances in T. 

The proof of Theorem 3.5 is based on Lemma 3.2. It requires two addi- 
tional simple results. Since their proofs are standard, we shall not present 
them here. 

Lemma 3.7. There exist absolute constants c\, 02, C3 and C4 for which 
the following holds. Let T G £2, set Vn to he the largest integer s satisfying 
2^ < cin and define 

C22^ ifs<Vn, 
C3n2^ \ lfs>Vn. 

Then conditions (a) and (b) of Lemma 3.2 hold if one selects 

rexp(-2-2(^--")/2), ifs<un, 

\2^^-^y\ ifs>Un- 



A. 



Lemma 3.8. Let g and f he nonincreasing, nonnegative functions and 
let {es)"=Q he a decreasing sequence. If for every s > 1, g{es^i) > f{ss), o,nd 
if there is a > such that for every s > 1, /(e^) — fi^s-i) > oif{es) then 



g{e) de + emf{em) > /(esjes-i- 

s=l 



Proof of Theorem 3.5. Let (A^)^^ and {Qs)%i be as in Lemma 3.7. 
Without loss of generality assume that T is a finite set and define the sets 
Tg iteratively, as follows. Set m to be the first integer such that |r| < Am, let 
Ts = T ioi s>m and set Sm = 0. For m — 1, let Em-i = inf{e : D{e, T^, -^2 ) — 
\m-i} and put Tm~i to be a maximal Em- 1 -separated subset of Tm whose 
cardinality is at most Am-i- Continue in this way to construct the sets Tg 
for s = m — 1,...,1. For every s, let Trs{t) be a nearest point to TTs+i{t) in 
Ts. 

Let s < m, and since the sets Tg are nested, then |{7rs(i) — TTs-i{t) :t G 
T}\ < \Ts\ and |7rs(t) — iTs-i{t)\ < Sg-i for every t gT. Therefore, apply- 
ing Lemmas 3.2 and 3.7, there is a choice {r]i)f^i G {—1,0,1}'^ with n/4 < 
|{i : r/j = 0}| < 3n/4 such that for every t gT, 



(3.3) 



i=l 



< ci ( ^ exp(-2 • 2^''--'^/^)es-i + 2^"'""^/^^.^ 
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It remains to bound the sums in (3.3) by the appropriate integrals, using 
Lemma 3.8. First, for s> Un let 

m m 

f{e)= 2(--)/2l(,^^^,,^], g{e)= ^ 2(--)/2l(,^,,^_,]. 

S = Vn + l S = Un + l 

Clearly, in [em,£un] = [0,£i/„], / and g are nonincreasing and nonnegative, 
for every e in that range 

f{e)<g{e)<V2f{e)<u{e), 

and the conditions of Lemma 3.8 hold. Since £m = 0, then 

For the other term in (3.3), if s < fn then 2* ~ < D{es) and the sum is 
estimated in a similar way. □ 



3.1.1. Spencer's theorem. Let us show how Theorem 3.5 can be used to 
prove a version of Spencer's celebrated result from [16] (see also [1, 11]). 



Theorem 3.9. There exists an absolute constant c such that ifTcB^ 
is of cardinality m>n, then 



disc(r) < C^/ 77, log ( — ) . 



6777 



Proof. Without loss of generality, assume that G T. Using the nota- 
tion of Theorem 3.5, for every e > 0, 7x(e) < ^/\og{emJn), and since T C 
then diam(T) < ^/n. Hence, there are {r]i)^^i G { — 1,0, 1}"" for which 77/4 < 
|{z : 7/j = 0}| < 3r7/4 and for every t gT, 



i=l 



< ci I \/log{em/n) de < ci\/nlog{em/n) . 
'o 



Now the result follows by repeating this argument for Pi^T, where Ii 
{7 : 77j = 0}, an so on. □ 



3.1.2. Matousek's VC theorem. A well-known measure of complexity for 
subsets of {0, 1}" is the VC dimension of the set (its real value counterpart 
will be used in Section 6). 
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Definition 3.10. Let T c {0, 1}". We say that cj c {1, . . . ,n} is shat- 
tered by T if P(jT = {0, 1}°" — that is, if the coordinate projection P^jT = 
{iti)i£a-t S T} is the entire combinatorial cube on these coordinates. De- 
fine VC(T) to be the maximal cardinality of a subset of {1, . . . ,n} that is 
shattered by T. 

In [10], Matousek proved that the discrepancy of a VC class is polyno- 
mially better than could be expected from a random choice of signs. He 
obtained the best possible estimate for the discrepancy of VC-subsets of 
{0, 1}" as a function of the dimension n. 

Theorem 3.11. For every integer d, there is a constant c{d) for which 
the following holds. IfT C {0, 1}'' andYC{T) < d, then disc(r) < c(d)ni/2-i/2d_ 

To prove Matousek's theorem, recall the following fundamental property 
of a VC class, due to Haussler [7]. 

Lemma 3.12. //Tc{0,l}" and\C{T) <d, then for every I a {I, ... ,n} 
and every < e < |/|^/^, 



D{e,PiT,li)<c{d) 



|/|l/2 



2d 



where c{d) is a constant that depends only on d. 

Proof of Theorem 3.11. Again, we may assume that G T and view 
T as a subset of M". Let e„ = inf{e ■.D{e)<n}. Therefore, £n < ci(d)ni/2-i/2d^ 
A change of variables shows that 





diam(T) 



^log{eD{£)/n)d£ < 02(^)71^/2-1/2^^ 



eM-Vn/D{£)) d£ < C2(d)ni/2-i/2<i_ 
Hence, there is a choice of {7]f)f^-^ G { — li 0, 1}*^ such that for every t G T 



4 = 1 



< C3(d)nV2-i/2^, 



and if we set Ii={i: rj} = 0} then < 3n/4. Since VC{Pi^T) < d then re- 
peating the same argument for the set -P/^T, there are {r]l)i^j-^ G { — 1, 0, 1}^^ 
such that for every t eT, IX^jg/^ Vi^il < C3{d)\Ii\^^'^~^^'^'^ , and so on. There- 
fore, there is a choice of signs (ei)"=i such that for every t £T, 



<C3id)^\Ij\y^~y^''<C4id)n' 



/2-l/2d 
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where we have used the fact that for every j, < 3|/j_i|/4. □ 

The proof of Theorem 3.11 illustrates once again the main property we 
used to bound the discrepancy of a subset of M". It is not enough for the set 
to be small in the sense of its metric entropy; what is needed is additional 
control on the "size" of all of the set's coordinate projections. One way of 
controlling those coordinate projections is by taking into account informa- 
tion about the position of vectors in the set, since coordinate projections of 
vectors in a good position shrink norms and mutual distances. 

4. A decomposition theorem for subgaussian processes. It is clear from 
our estimate on the discrepancy of a set T C M" that it would be useful to 
control the £2 distances between points in T for every / C {1, . . . ,n} — that 
is, distances between coordinate projections of elements of T. One would 
be able to obtain a good bound on disc(T) if T is not too rich and if for 
every / C {1, . . . ,n} and every x,y £ T, \\x — y\\gi is significantly smaller 
than \\x — uWi^. Unfortunately, usually this is not true even for a single 
vector z = X — y. Indeed, if z is supported in / then ||^||^/ does not "shrink" 
at all. On the other hand, if the coordinates of z are roughly equal, then 
the coordinate projection onto any / shrinks the norm of z by a factor of 



It is well known that a strong shrinking phenomenon is exhibited by 
vectors in a general position. In this section, we will show that if a class of 
functions F is L-subgaussian, then a shrinking phenomenon happens for a 
typical set 



uniformly for all coordinate projections of T. 

4.1. Shrinking for a single function. As a starting point, let us describe 
the so-called "standard shrinking" phenomenon for a single function /. Let 
/ be a function for which ||/||^2 — -^11/11^2- Then, concentration implies that 
with high probability. 



However, as we mentioned above, the shrinking phenomenon one needs here 
is more general — that for every subset / C {1, . . . , A;}, the norm of / is up- 
per bounded (possibly up to a logarithmic factor) by ||/||l2 (which translates 
in the £2 normalization to the shrinking of the norm) . The following lemma 
shows that this stronger claim is true as well whenever / is L-subgaussian. 



(|/|/n)V2. 



T = P^F = {{f{Xi))l,:fGF} 



11/11 
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Lemma 4.1. For every < 5 < 1 and L > 0, there is a constant c{5,L) 
for which the following holds. If \\f\\tp2 — -^11/11-^2 then for every integer k, 
with probability at least 1 — 6, for every I C {1, . . . , k}, 



stein's inequality, for every t > 0, 



Li<c{6,LWlogiek/\I\)\\f\\L,. 

Ipl ~ \\j \\^2 ' 



Proof. Fix k and / C {1, . . . , fc}. Since ||/^|Li = , then by Bern- 



>t||/||^, ) <2exp(-co|/|min{^^^}). 

i&I 



Let m < cik and recall that there are at most (ek/m)"^ subsets of {1, . . . ,k} 
of cardinality m. Hence, it suffices to take t = (3{5)log{ek/m) > 1 and ob- 
tain that with probability of at least 1 — 2 exp{—co(3m\og{ek /m)) , for every 
subset / of {1, . . . , A;} of cardinality m, 

/I \ 1/2 

(4.1) 11/11^. = - E /'(^«) ^ C2{S)Wlog{ek/m)\\fU,. 

Therefore, summing the probabilities with respect to m, it follows that for 
the correct choice of /3, with probability at least 1 — 6, (4.1) is true for all 
subsets of {1, . . . ,k} of cardinality at most cik. The claim now easily follows. 
□ 



4.2. Shrinking for a class of functions. When one attempts to generalize 
this simple shrinking argument to a class of functions, one faces a problem: 
the probabilistic estimate obtained in the proof of Lemma 4.1 does not 
allow one to control many functions simultaneously. Thus, a naive extension 
of that result is simply too weak to lead to a function class analog of the 
shrinking phenomenon. 

To formulate the shrinking phenomenon for an L-subgaussian class of 
functions, let us recall some notation. For any two sets A and i? in a vector 
space, A + B = {a + b: a£A,b£ B} , and for a class of functions F, a random 
sample a = {Xi, . . . ,Xk) and / C {1, . . . , k}, 

P,F = {{f{X,))U -J^F}, PfF = {(/(X,)).e/ : / G F}. 

For every integer m, let Wm = {x S W^:x*j < l/\f]^j = where 
(xpjL]^ is a monotone nonincreasing rearrangement of (|xj|)jL;^. Thus, Wm 
is the unit ball of the weak (.2 space i'2'00 ■ Denote by Vm the collection of all 
subsets of {1, . . . , /c} of cardinality at most m and set Tm to be the smallest 
integer s such that 2^" > exp(mlog(efc/r7i)) > \ Vm\. 
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Theorem 4.2. For every < 6 <1 and L > there exist constants ci, 
C2, C3 and ko depending only on L and 6 for which the following holds. Let 
F be an L-suhgaussian class of functions and assume that for each f & F, 
E/ = a for some a G R. Then, for every integer k and every m < k, there 
are sets F^ and F^ C F with the following properties. First, F C F^ + F^ ; 
second, with fi'' -probability of at least 1—6, if cr = {Xi , . . . , X^) then: 

1. For every integer m< k and every I C {1, . . . ,k} of cardinality m, 

P[FrCCi-f2,rAF,L2)Wm.. 

2. For every f,h£ F^^ and every / C {1, . . . , A;} of cardinality m, 

\\f -h\\Li^<C2^/\og{ek/m)\\f -hU,. 

3. If k> ko then for every m<Oik and every f,h£ F^ , 

\\f-h\\L,<V2\\f-h\\L^^. 

The way Theorem 4.2 should be understood is as follows. Consider a typi- 
cal cj = (Xi, ... , Xk) and let T = P^F C 4- Then, for every Ic{l,...,k} the 
further coordinate projection satisfies PjT C PiTi + P1T2 where Ti,T2 C £2 
depend only on the cardinality of / and not on / itself, and T2 C T. The set 
P/Ti captures the "peaky" part of PjT and is contained in a relatively small 
set: a ball in ^2,00 whose radius depends on the "complexity" of the class 
F. The set T2 consists of vectors that satisfy the desired shrinking prop- 
erty. Indeed, for every (/(Xi))*Li, ih{Xi)%^ G T2 and every I C {1, . . . , A:} 
of cardinality m one has 

/ \ 1/2 

( - hiX,)f < ciVmlog(eA:/m)||/ - hU, 

I / ^ \^^^ 

< C2^j log{ek/m) ( - ^(^^))' j ' 

where the last inequality holds if m<C2,k. 

Proof of Theorem 4.2. Fix an integer k. For every integer m<k, let 
^m)'s^T be an almost optimal admissible sequence of F with respect to 
72,Tm(-P'i V'2)i and set vr™ to be the metric projection onto Hg^m with respect 
to the 1^2 norm. For every such m we will construct two sets of functions, F^ 
and F^ such that F C FJ^ + F^ as follows: let F{" = {f - iT!p^{f) : f £ F} 
and set -F2" — {'^tL (^) ■ f ^ [and from here on we will omit the superscript 
m and write 7Ts{f) instead of T^^if)]. Note that this choice of decomposition 
depends only on m and does not depend on k. 
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For every / G Vm set Zj = '^i^iif {^i) — E/) and observe that 



y zi 



(/) 



z. 



S>T,n 



5>r„i ig/ 



since the expectation of all the functions in F is the same. Thus, for every 
f £ F, TTsif) — iTs-i{f) has mean zero, and for every s > Tm and every t > 1, 



Pr 



^(7r,(/)-7r,_i(/))(X, 



>t||7r,(/)-7r,_i(/)||^,V|/ 



< 2exp(-cot^). 



Let t = u2^^'^ for n > ci, where ci is a constant to be named later. Because 
of our choice of s, \Vm\ < 2^" and \Hs^m\ • \Hs-i,m\ < 2^" , and thus 

Pr(3/ GFjGVm-. - > t^2^/'||vr,(/) - 7r,_i(/)||^, vl^) 

< 22'+'|y^| . 2exp(-con22^) < exp(-C2'u^2'*). 

Hence, summing over s > Tm, it follows that with probability at least 

1 - exp{-C2U^2') > 1 - exp(-C3ti22"''"), 

for every f £ F and every / G Vm 

Summing the probabilities for all possible integers 1 < m < k and noting 
that for every 1 < m < k, 2"^™ > mlog{ek/m), it is evident that for u> ci 
there is a set ^ C O'^ with probability at least 1 — exp(— C4ti^) for which the 
following holds. For every G A, every 1 < m < k, every h G -F{" and 

every I £Vm 



<2uy/\I\j2,rjF,i;2), 



where we have used the fact that {Hs^m)^Tm almost optimal admissible 
sequence with respect to 72,Tm (-^, V'2)- 

Fix {Xi, . . . , Xk) £ A, 1 < m < k, I C {1, . . . ,k} of cardinality m and h G 
F^. Consider the sets 1+ = {i e I : h{Xi) > 0} and /_ = {i G I : h{Xi) < 0} 
and note that both are in Vm- Since x^/^ is increasing, then on the set A 

(4.2) 
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In particular, if {h*)^^^ is a nonincreasing rearrangement of 

then by (4.2) applied to the set Ij consisting of the j <m largest elements 

1 ^ 1 

h* <-Y.K < - •4nji/^2,r^(i^,V'2) <4Lu72,r^(i^,^2)/VJ; 
i=l 

thus, PfF^ C 4Lu72,.^(F, L2)Wm. 

Turning our attention to the sets F^, we will show that with high prob- 
ability, for every 1 < m < k and every I C {!,... ,A;} of cardinality m, the 
coordinate projection Pj : (F™, L2) — )■ (F™, Lg) has a well behaved Lipschitz 
constant. To that end, fix 1 < m < /c, set Gm = {\fi~ /2I '■ fi £ -^2"} and recall 
that for every function g, \\g'^\\^-^ = Hfi'll^j- Hence, by Bernstein's inequality, 
for every g G Gm and every t > 1 , 



Pr 



( -f2g\X,)-Eg^ 
\ 1=1 



> ^IbllS., < 2exp(— commin(t^,t)). 



Let Em be the collection of subsets of {!,..., A;} of cardinality m. Since 
\Gm\<\F^'-? < 22"'"^' and \Em\<\Vm\< 2^"", then by taking n > C5 and 
t = iilog(eA;/m) > 1, 

Pr(3ffGG™,lGF^: - J]] ^^^^.^ _ ^^2 > ||g||2^^,log(eA;/m)) 

< 2^ '"^ exp(— cotimlog(eA:/m)) < exp(— C6umlog(eA:/m)). 

Summing over all possible 1 <m<k, there is a subset B C of probability 
at least 1 — exp(— cyu) on which the following holds. For every 1 < m < k, 
every /i, /2 S F2"* and every I £ Em, 

ll/i - Mliz < ll/i - /2IIL + nlog(efc/m)||/i - f2\\l^ 

(4.3) 

<2L2ulog(efc/m)-||/i-/2||i,. 

Thus, fix a "legal" choice of u for which Pr(^n;B) > 1 — 5/2. Since both 
(4.2) and (4.3) hold on that event, the proof of the first and second claims 
is evident. 

For the third part, fix t < 1/2 to be named later. Again, by Bernstein's 
inequality and since F is L-subgaussian, then with probability at least 1 — 
2|F2'"|2exp(-coA;t2), for every /i,/2 G F2"* 

ll/i -/2IIL < ll/i -Mli. +tL^h - hWl^. 

Thus, taking t = 1/(2L^), for k > ko{6,L) and m < cs{L)k, it is evident that 
with probability at least 1 — 2exp(— C9(L)fc) > 1 — 6/2, for every /i, /2 G F2", 

ll/i-/2iiL< 211/1 -Mi^., 
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as claimed. □ 

4.3. Shrinking properties of the 72,5 Junctionals. The first corollary of 
Theorem 4.2 we shall present here is a shrinking property of 'j2,s{F, L2). 

Theorem 4.3. For every < 6 <1 there exists a constant c{6) ~ log(2/(5) 
for which the following holds. Let F be an L-subgaussian class of functions 
on a probability space and assume that for every f £ F, E/ = a for 

some a G M. Then, with probability at least 1 — 5, for every Id {1, . . . , fc} 
and every integer s that satisfies 2^ < |/| log(eA;/|/|), 



72,.+i(F,L^) < c((^)L72,.(F,L2)Vlog(eA:/|/|). 

Before proving Theorem 4.3, recall the following well-known result on the 
expectation of a monotone rearrangement of independent standard Gaussian 
variables (see, e.g., [5, 6]). 

Lemma 4.4. Let {gi)^^i be independent standard Gaussian variables and 
denote by {g*)^^i the nonincreasing rearrangement 0/ Then, 



Eft* 



v/log(2n/i), ifi<n/2, 



n + r 



if i > n/2. 



Moreover, 



/ m \ 1/2 

(^^(5-*)^ ~ ^Jm\Q>■g{en|m). 

Proof of Theorem 4.3. Fix < 5 < 1 and let the sets A,B£^^ be as 
in the proof of Theorem 4.2. Take any (Xi, . . . , X^) € Ar\B, let I C {1, . . . , A;} 
and set m=\I\. Since PiF C PjF^ + PjF^, then by the sub-additivity of 
72, s, it is evident that for every integer s, 

72,.+i(F,L^) < 72,.(F™,L^) + 72,.(F2-,L^). 

By (4.3), the mapping Pj : (F™, L2) — ?• (-F™, Lg) is a Lipschitz function with 
a constant c((5)L(log(e/c/m))^/^. Therefore, recalling that F™ C F, 



72,.(F2",4) < ciVlog(eA:/m)72,.(Fr,^2) 
(4.4) 

< ci y^log{ek/m)-f2,s{F, L2), 

where ci = ci(L,5) ~ Llog{2/6). To conclude the proof, observe that by 
Theorem 4.2, PjF^ C BmW^, where = C2(L, (5)72,,„ (F, L2) and Wm = 
{xeR"':x*< l/v7,j = l,...,m}. 
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Since the 72,3 functionals are monotone with respect to inclusion and are 
decreasing in s, and since = for every x G then 

Applying the Majorizing Measures theorem and Lemma 4.4 

m m ^ 

12{W, I • I) < C3E sup y^^QiWi = C3E V ^ < C4V^. 

Hence, for every s, 72,s(-F"\L2) < c^Bm, implying that for every s < Tm, 
l2,s{Fi' , L2) < C5(L,(5)72,s(F,L2). Combining this with (4.4), it follows that 
for every / C {1, . . . , k}, 



j2,s+iiF, Li) < C6(L, 5)Vlog(e/c/|/|)72,.(i^, ^2), 
as claimed. □ 



Remark 4.5. The proof of Theorem 4.3 yields a stronger result than the 
one formulated. It shows that with probability 1 — 5, for every I C {1, . . . , A;} 
and every s > 0, 

72,.+i(F,i^^)<c(L,(^)(72,.|,|(F,L2) + ^log(eA;/|/|)72,s(F,L2)). 

Observe that in some sense, the range s < r|/| [i.e., 2^ < |/| log(e/c/|/|)] is the 
interesting range of s, since 

72,s(P/i^, I • I) < diam(P/F, | • |)72,s(4"> I "I), 
which decreases exponentially in s for 2** > ci\I\. 



Another outcome of Theorem 4.2 was formulated as Corollary B in the 
Introduction. 



Corollary 4.6. For every < 6 <1 and L > 0, there exist a constant 
c{6, L) such that the following holds. Let n be an isotropic, L-subgaussian 
measure on M", set {Xi)'^^-^ to be independent, distributed according to 
and consider the random operator T = J2i=i{-^i'')^i- If F CW^ and V = 
k~^/'^TT, then with -probability at least 1 — 6, for every I C {1, . . . , /c}, 




where the expectation on both sides is with respect to the Gaussian variables. 
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The proof of Corollary 4.6 follows from Theorem 4.2 and the Majorizing 
Measures theorem. 



Proof of Corollary 4.6. Since /i is an L-subgaussian measure on 
M", each t E M" corresponds to a function ft{x) = {t,x), :M" — )• M, for 
which < L\t\. Let F = {ft:t£ T}, set J7 = M" and put a = {Xi, . . . , 

Xk) € f^'^ for which the assertion of Theorem 4.2 holds. 

Fix / C {1, . . . , fc} of cardinality m. Since F is a class of linear functionals, 
the decomposition of F given in Theorem 4.2 actually implies a decomposi- 
tion of T which we denote by T{" and T^*^. Thus, for every t^T, t = t^ + t^, 
where f eT^ iov i = l,2. Since P„F = {{ft{X^)%^ ■.t£T} = TT then 



Esup 



iV, 



— pEsup 



Y,9i{t] + tlXi 



u - vwq, 



Clearly, for any u,v gT, 

Wfu- fv\\q='>^''^^'^\\ii^i^'^-v))\\ei ^""^ \\fu-fv\\L2 

and by the shrinking property of T™, for every u,v £ T™, 

||((n,X,))ti-((^;,X,))ti||,/<ci(L,5)(mlog(eA:/m))^/'||n-7;||,n. 
Therefore, by Slepian's lemma (see, e.g., [8]) and since T™ C T, 

71 



^Egsup 

Vk teT 



Y^giitlXi 



<ci{L,5) 



m 



log(efe/m)Eg sup 
k teT 



m . 



< ci(L, 5)^1 — log{ek/m)E.g sup 
k teT 



1=1 

n 



^giU 



7 = 1 



Also, recall that PjFJ^ C C2{L, 5)^2,T„^iF, L2)Wm, and, just as in the proof 
of Theorem 4.3 and by the isotropicity of /x, 

72(P/Fr, I • I) < C3l2,rAF, L2)V^< C372(i^, L2)V^ = C372(T, \ ■ \)V^. 

Applying the Majorizing Measures theorem. 



'12{T, I • I) < C4ESUP 
ieT 



7=1 



and thus 



^pEsup 

Vk ti^T 



m 

< C5\ —Esup 

k teT 



^giU 



i=l 
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as claimed. □ 

To put Corollary 4.6 in the right context, even if one considers the case 
where /i is the canonical Gaussian measure on R*^, the standard concentra- 
tion estimate for the norm of a Gaussian vector around its mean (used in 
[14] to prove the result for I = {1, . . . ,k}) is not strong enough to allow a 
uniform control over all subsets of {!,..., /c}. What allows one to bypass 
this obstacle and obtain a result even in a subgaussian setup (in which case 
such a concentration result does not exist, and thus, even the result for 
I = {l,...,k} is not obvious) is the application of a cardinality-sensitive 
deviation argument rather than a concentration based method. 

Note that the logarithmic term in Corollary 4.6 cannot be removed. For 
example, if T = {t} and is the canonical Gaussian measure on then 
the vector ((^i,t))jL;^ has the same distribution as |t|(ffi)f=u where (5i)f=i 
are independent standard Gaussian variables [that are also independent of 
(<7j)F=i]- Recall that is the collection of subsets of {1, . . . , A;} of cardinality 
m and observe that 



where the last assertion is the second part of Lemma 4.4. Therefore, with 
probability at least ci , there will be some / G Em for which 



showing that indeed, one cannot remove the logarithmic term. 

5. Proof of Theorem A. As we explained in previous sections, our method 
of selecting signs in a way that is better than choosing typical signs depends 
on two properties. One is that the complexity of the set (as captured, e.g., 
by 72, s or the metric entropy of the set) is small, and the other is that the set 
is in a good position (e.g., if coordinate projections shrink the set's complex- 
ity) . Our results thus far indicate that for a subgaussian class and a typical 
a = (Xj)^^^, P^F is essentially a set in a good position. Thus, it seems likely 
that the ability to choose signs that outperform the typical behavior of signs 
will be governed solely by the complexity of F. As Theorem A, which we 
reformulate below, shows, this is indeed the case. 




1/2 




t\ \Jra log{ek/m), 
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Although the proof of Theorem A is rather technical, the basic idea be- 
hind it is simple. It follows from a combination of the two main results of 
the previous sections. First of all, that a typical coordinate projection of a 
subgaussian class is contained in the Minkowski sum of a small set and a 
set that satisfies a strong shrinking property. Second, that the discrepancy 
of sets that satisfy a shrinking property may be bounded in a nontrivial 
manner using their metric complexity. 



Theorem 5.1. For any 0<(5<1, 0<p< 1/2 and L > there are 
constants ci and C2 that depend on 6, p and L and for which the following 
holds. Let F C L2{^) he an L-subgaussian class, consisting of mean zero 
functions. Then, for every k there is a set Ak C $7'^ with ^^[Ak) >! — 5 
such that for every (Xi, . . . , Xk) E Ak and every / C {1, . . . , k}. 



inf sup 



< \/\I\a 



where for every n< k 



72,iog2iog2(c2n)(-^,^2) • V'^og{ek/n) + diam(F, L2) 



Before proving Theorem 5.1, let us recall the following notation. For every 
integer m, Sm is the largest integer s such that 2^" < K^m. li m < k, then 
Tm is the first integer for which 2^^ > exp(m • log(eA;/m)). In particular, for 
every 1 < m < k, Tm> log2 log2 k (but of course, Tm could be much larger). 
We will also say that for a = (Xi, . . . ,Xk), a function class F satisfies the 
shrinking property on / C {1, . . . , A;} with a constant c if for every f,hG F, 



||/-/i|li.<cVlog(eA;/|/|)||/ 



1^2- 



Proof of Theorem 5.1. Fix 0<S <1 and consider a = {Xi, ...,Xk) 
for which the assertions of Theorem 4.2 hold. Fix any integer n < k and 
let Iq C {1, . . . ,k} be of cardinality n. Using the notation of Theorem 4.2, 
we may decompose F C F^ + F^, where P/^Ff C ci72,r„ {F, L2)Wn, -F2" ^ F, 
and F2 satisfies the shrinking property on every / C {1, . . . , A:} of cardinality 
n with a constant c = c(L, 6) — and in particular, it does so on Iq. 

For every f £ F, choose /i G F" and /2 G F2 such that / = /i + /2- Hence, 
for every (r?i)ie/o ^ {-^1 0' '^V" and any f £ F, 



< Cl72,r„(F,F2)\/n + 



ie/o 



DISCREPANCY, CHAINING AND SUBGAUSSIAN PROCESSES 



31 



Let (Fs)"^-^ be an admissible sequence of F2 which wih be specified later 
and set T^s{f2) to be a nearest point to /2 in Fg. As in Corollary 3.4, if 



2S/2 



if S < Sn, 

if s = s„, 
if s > s„. 



then there exist (r?°)i6/o ^ {-1,0, 1}-^° such that n/4 < |{i:r/° =0}| < 3n/4 
and for every (/2(X,))i6/o ^ ^/"^2", 



Since functions in F2 satisfy the shrinking property with a constant c, then 
for every / € F2 



|^,(/) - ^.-i(/)||/o < cVnlog(e^/n)||7r,(/) - 



L2 1 



implying that 



Y^rflUX^) <cVnlog(eA:/n)J]Q0||7r,(/2)-7r,„i(/2)||L,. 

ie/o s=l 

Let Ii = {i £ Iq : iji = 0} and continue in the same manner: first decompose 
F C f]^^' + ^' , then apply the fact that Pf^Fj^^' is contained in an appro- 
priate weak £2 ball, and finally, since f]^^' satisfies the shrinking property 
on Ji, use Corollary 3.4 again, and so on. 

As a result of iterating this argument, there are nested subsets of {1, . . . , k}, 
(IjYjLo^ with I Jo I = n and of cardinalities 



< -\L 



l<l^jol<10, 



and vectors {r]l)i^j. G {—1,0, l}^-?, such that Ij+i = {i-rjj =0} with the fol- 
lowing property. For every < j < io 1 let 

exp(-K5|/j|l/2)^ ifs<S|7^.|, 
Qi = K4{ 1, if S = S|j^.|, 

2^/^ ifs>S|,^.|, 



and for every f£F,f = /( + f^, f( G F^^' , G F^^^' one has 
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<cJ|/,|log(efc/|/,|)^g^,||7r,(/|)-7r,_i(/^')||i, 



Therefore, there are signs G {—1,1}" such that. 



sup 

/6F 



i=l 



JO 



(5.1) 



<cY,Jmog{ek/\I,\)snpY,Qihs{f)-7rs-i{f)\\L, 



j=0 



JO 



+ ci ^ Y |/il72,r|^ I (-^, L2) + C2 diam(F, L2) log{ek), 
j=o 

where the last term comes from a trivial estimate on the discrepancy of a 
projection of F onto the set of coordinates {i £ Ijg : rjj'^ = 0} and the shrink- 
ing phenomenon. 

To complete the proof, one has to bound (5.1) from above. To that end, 
set bj = \Ij\ and recall that (l/4)-^n < bj < {3/4)^71. To estimate the second 
term in (5.1), since bj > 1 then > log2log2 /c. Therefore, 

JO 

\l |/j|72,r|,^.| (i^, L2) < C3\/^72,log2log2 fc(i^, L2). 

j=0 

Turning our attention to the first term in (5.1), for every 1 < ^ < s„ let 
Ue = {j : Sbj = i}, set 6^ = max{6j -.j G Ue} and bJ = min{6j : j G Ui}. In 
other words, Ui consists of all the integers j for which s\j.\ = Sbj = b^ is 
the largest cardinality of such a set and 6^ is the smallest one. Since 

^3^2^'^' <bj <bj < m.m{K^^2^'^\n}, 

then for every j G Ui, the sequence {Qi)s>o satisfies 

{exp(-«;5«;3^^^-2^'^), if s < ^, 
1, ifs = ^, 

2'/^, ifs>i, 

and we denote this sequence {Ql)s>o- Since bj decays exponentially, then 

JO 



J2 log(eA:/|/,|) sup ^ Qi||7r,(/) - 7r,_i(/)||i, 



j=o 



s=l 
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Sn OO 

e=lbj&Ue f'^^ s=l 

Sn . OO 

< C4 V V hj \og{ek/h+) sup y^Qihs if) - VTs-l (/) • 



Set de = ybJ\og{ek/b'J), fix < p < 1/2 and let ii be the largest integer 

such that K^^22'^+' < n'^P. Then, for every l<h,bj < n^^2^'^^^ < v?p and 
for £ > ^1, <n. Observe that for every s,l, < Kil^^"^ and for every 
/ G F, ||7r,(/) - 7r,_i(/)||L, < 2diam(F, L2). Therefore, 

Sn 00 

V di ■ sup V Q%s if) - vr,_i(/) 

s„ ll 

<Y.d,- sup J^Qf ||7r,(/) - 7r,„i(/)||i, 

Sn 00 

+ ^de-SUp QiW^sif) - TTs-lif)\\L2 

e=i s=ei+i 

Sn t-1 

< 2 diam(F, ^2) X] 

e=i s=i 

Sn 00 

+ K4X;rf^-SUp Yl 2^/'|Ks(/)-^s-i(/)||l, 

£=1 /^''^ 5=^1+1 

< 2 diam(F, ^2) X] ^^'^f + C5(p)Vnlog (efc/n) • ^2) 

^=1 s=l 

for an almost optimal choice of (-Ps)^^^- 

Now, for every s < £1 and using that 6^ < for ^ < £1 and 6^ < ?i for 
^ > ^1 , it is evident that 

< nP ^\ogiek/n^P) ^ QI + ^Jn\ogiek/n) ^ = (*) . 
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Note that 2^^+^ > 2p\og2C&n and thus 2^^^^ > (p/2) log2 cen. Therefore, 
there is an absolute constant cj such that if s < then 

i<ii e<s e=s+i 

< C7{s2'/^ + exp(-C322')) < 2c7s2^/2 

and 

J] < C7exp(-C322'^) < cjexpi-csnP/^). 

i>ii 

Hence, there is a constant cq{p) such that for every s <ii, 
(*) < cg{p)s2'/'^nPy/log{ek/n^P), 

and thus, 

disc(P7F) < C2 diam(F, L2) log{ek) + C3 V«^72,iog2 log, ^2) 



+ cio diam(F, L2) • ei2'^^/^nPy^log{ek/n^P) 



+ cioj2,£iiF, L2) ■ \Jn \og{ek/n). 
Since (p/4) log2 cgn < 2^^ < (/3/2) log2 cqu, the claim follows. □ 

Corollary 5.2. ije< < p < 1/2. Under the assumptions of Theorem 
5.1 and using its notation, for every a G Ak 

disc(P^F) < Cl{^/k ■ -i2,\og^\og^(c2k){F,L2) + kP d\imi{F,L2)) 

and 



Hdisc(P(j-^) = sup inf sup 

/C{l,...,fc} (£i)Ll/6F 



< sup any/n. 

l<n<k 



In particular, if lims^oc72,siF, L2) = (i.e., if F is p-pregaussian), then 

-^Hdisc({(/(X,))Li:/Gi^}) 
converges in probability to 0. 

Let us mention once again that the reason that Theorem 5.1 is meaningful 
is because for a typical a class of mean zero functions that is L- 

subgaussian satisfies that 

k 



ciapVk < Eg sup 

/6F 



<C2{L)^2{F,L2)Vk. 



1=1 

Thus, there is a true gap between the discrepancy (or even the hereditary 
discrepancy) of a typical coordinate projection and the average over signs 
of a coordinate projection of a pregaussian, subgaussian class F. 
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6. Equivalence for large sets. In this section, our aim is to show that if 
F is a subgaussian class that indexes a bounded Gaussian process, then the 
reason for the gap between the expectation over signs of a random coordinate 
projection and the infimum over signs is indeed that lim^^oo l2,s{F-i L2) = 0. 

To be more precise, we show the following. 

Theorem 6.1. For every < 5 < 1 and A,B,L > there is a con- 
stant c{6,A,B,L) for which the following holds. Let F C -B(L2(/i)) be a 
class of mean zero functions such that absconv(F) is L-suhgaussian. If 
72(-F, i^2(/^)) < j4 < 00 and if the entropy numbers satisfy that 

limsupj^''^ej(absconv(F), L2(^)) = -B > 0, 

then there is a sequence of integers (ki)^^ tending to infinity, such that for 
every i, with probability at least 1 — 6 in Q,^"- , 

Hdisc(P^F) > c{6,A,B,L)^/ki, 

where a = {Xi,X2, ■ ■ ■ j^fcj G is selected according to fi^"- . In particular, 
Hdisc(P(j-F)/\/A; does not converge to in probability. 

Observe that this is almost the reverse direction of Theorem A. Indeed, 
it is well known (see, e.g., [3], Chapter 9) that there is no entropic char- 
acterization of classes that index a bounded Gaussian process which is not 
continuous; such a characterization is given by a majorizing measures ar- 
gument [17]. However, because {Gf.f G F} is a bounded process with a 
covariance structure endowed by L2{iJ.), then by Sudakov's inequality (see, 
e.g., [8], Chapter 3), 

log A^(e, absconv(F), L2(/x)) < ci 

On the other hand, since F C B{L2{fJ.)) is not /i-pregaussian, one can show 
that 

/ ^/k^iN{^^FM(ii})de = oo. 
Jo 

Thus, up to a logarithmic factor, the entropy numbers of F are as in 
Theorem 6.1. Whether Theorem 6.1 remains true using only the assumption 
that limsupg_j.oo 72,s(-F, L2(Ai)) > is not clear. 

The idea behind the proof of Theorem 6.1 is to find a cube in a typical 
coordinate projection of absconv(F). We will first show that if absconv(F) 
has a "large" separated set with respect to the i2(/^) metric at scale ~1/-v/A;, 
then its typical coordinate projection of dimension k contains a cubic struc- 
ture of dimension ~A; and scale ~l/\/fc- The cubic structure we will be 
interested in is captured by the combinatorial dimension. 
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Definition 6.2. Let F be a class of functions on fi. For every e > 0, 
a set a = {xi, . . . ,Xj} C is said to be e-shattered by F if there is some 
function s : a — t- M, such that for every / C {1, ■ ■ ■ ,j} there is some fiGF for 
which fi{xi) > s{xi) + e a i £ I, and fi{xi) < s{xi) — e \i i ^ I. Define the 
combinatorial dimension at scale e by 

VC(F, e) = sup{|(t||(T C il,(7 is e-shattered byF}. 

Note that if F is a {0, l}-class of functions then VC(F) = VC(F,l/2). 
Also, in a similar way one may define the combinatorial dimension of a 
subset of M", when each vector is viewed as a function defined on {1, . . . , 

It is standard to verify that if VC(-F, e) > m, then the coordinate pro- 
jection PtF, defined by the shattered set r, contains a subset of cardinal- 
ity exp(cm) which is cie-separated with respect to the L2[^Lt) norm (re- 
call that is the uniform probability measure supported on r), and that 
d\sc{PrF) > C2me (see Lemma 6.5). As we mentioned in the Introduction, 
the reverse direction is also true, and if F C B{Loo{i^)) contains a large well- 
separated set in L2{n) that it must have a large combinatorial dimension 
at a scale that is proportional to the scale of the separation (see [12] for an 
exact statement and proof). A fact that will be used here and which is based 
on this reverse direction is the following. 

Theorem 6.3 [12]. There exist absolute constants ci and C2 for which 
the following holds. Let V C and assume that IEsupj,gy |^^^-|^ eji;j| > 6k. 
Then, 

YC{V,ci6)>C26^k. 

Hence, the only reason that Esup^gy £iVi\ is almost extremal is that 
V contains a large cube in a high-dimensional coordinate projection. 
The key observation of this section is the following theorem. 



Theorem 6.4. For every A,B,L > and < 5 < 1 there exist constants 
c\ and C2 that depend on A,B,L and 6 for which the following holds. Let 
F C B{L2{^)) he a convex, symmetric, L-subgaussian set of mean zero func- 
tions. Suppose that ^2{F,'4^2) < ^ < oo and that there is some k for which 
ek{F, L2{fJ,)) > Bj^fk. Then, there is a set S C such that /i^(S) >1 — 6 
and for every o" G S, 

Yc( P^F,^) >C2k. 



Theorem 6.4 implies Theorem 6.1 because of the next lemma. 
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Lemma 6.5. IfTc 



then 



Hdisc(r) > sup(5VC(absconv(r),5). 

5>0 



Proof. First, note that Hdisc(T) = Hdisc(absconv(T)), and thus we 
may assume that T is convex and symmetric. Now, let / C {1, . . . ,n} he 6- 
shattered by T with the level function s. Fix {£i)i^i € {—1, 1}'^' and without 
loss of generality assume that — 0- Since / is (5-shattered by T, 

there is some t' £T for which t'->Si + 6 when = 1 and t'^ < Si — 5 when 
£i = —1. Thus, 



sup 



£i^i 



> 



i&I 



> 



Si) 



as claimed. □ 



Hence, from here on we may assume without loss of generality that the 
class F is convex and symmetric, and that it is L-subgaussian. 

The proof of Theorem 6.4 requires several additional facts. To formulate 
them, denote for V C M" 

and ii A,B C M", set N{A, B) to be the minimal number of translates of B 
needed to cover A. 

The first lemma we need is taken from [9]. 

Lemma 6.6. Let y c M'' he a convex, symmetric set. For p > 0, set 
Vp = VnpBl and F{p)=i^{V)/i^iVp). Then, 

N{V, SpBl) < exp (2 (^^) ' log(6F(p)) 

The second result was proved in [13] (Theorem 2.3). Although it was 
formulated there for subsets of M", its proof shows that the claim is true 
for any subgaussian class of functions. It implies that a random coordinate 
projection of F, viewed as a mapping between L2{p) and L2, is almost norm 
preserving for functions with a sufficiently large -^2(m) norm. 

Theorem 6.7. There exist absolute constants ci and C2 for which the 
following holds. Let F d L2{p) he a convex, symmetric, L-subgaussian class 
of functions. For every 9 > and any positive integer k, set 

in, .A ^ ^2{Fr\pB{L2{p)),i^2) \ 



i=l 
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Then, with probability at least 1 — 2exp(— ci^^/c/L^), for every f £ F such 
that \\f\\L,i^,)>rk{e/c2L^), 

(1 - e)'/'ll/llL.(^) < II/IIl,^ < (1 + 

Corollary 6.8. For every L > 0, there are constants kq and K7 that 
depend only on L, for which the following holds. Let F be an L-subgaussian, 
convex and symmetric class of functions for which 72 (-^, ^2) < ^4 < 00. Then, 
with probability at least 1 — 2exp(— Kg/c), if f €z F and \\f\\L2{^l) — KiA/^/k 
then 




In particular, if H C F is an e-separated set in L2{fJ.) for e > 2ktA/\/^ then 
with probability at least 1 — 2exp(— KgA;), PaH is e/ 4:- separated in Lj. 



Proof. Let ci and C2 be as in Theorem 6.7. Observe that 

72(FnpB(L2(^)),V-'2) < 72(i^,^2) < A 

and apply Theorem 6.7 for 6 = 1/2. Thus, rf;{9/c2L) < cs{L)A/^/k, implying 
that if C4(L) = ci/4L^ then with probability at least 1 — 2exp(— C4(L)A;), if 
||/|U,(^)>C3(L)A/Vfc then 

lii/iiU)<ii/iii,^<iii/iiW)- 

Turning to the second part, note that ii H d F is e-separated in L2(/u) 
for e > 2ci{L)A/y/k, then for every /ii,/i2 £ H, f = (hi - /i2)/2 G F and 
||/||L2{/i) ^ ci{L)A/Vl^. Thus, the second part follows from the first one. □ 

Now we can formulate the first localization result, showing that the rich- 
ness of a typical coordinate projection comes from the intersection of F with 
a ball of radius ~ 1/ y/k. 



Theorem 6.9. For every positive A, B, L and < 5 < 1, there are 
constants c > 1, ci C2 and C3 depending on A, B, L and 6 for which the 
following holds. Let F C B{L2{^)) be a convex, symmetric, L-subgaussian 
class of mean zero functions such that ^2{F,ip2) < A<oo. Fix an integer k 
and assume that ek{F,L2{fi)) > Bj^fk. Then, with probability at least 1 — 
6 — 2exp{—cik) , 



Kg sup 

f£Fnc2/^B{L2(ii)) 



ck 



i=l 



> csVk. 
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Proof. Since F is L-subgaussian and by applying Sudakov's inequality, 
we may assume without loss of generality that A/B > 1. Let H hea maximal 
B/y/k separated set in F with log > k. Let k' = c^k for a constant c> 1 to 
be named later. Since is e = cB j^fk' separated in L2(^), then by Corollary 
6.8, with probability at least 1 — 2exp(— KgA;') = 1 — 2exp(— ci(L)A;), if e > 
2KjA/^/k! then P^H is e/4-separated in . Moreover, if f ^ F satisfies 
||/||i,(^)>K7A/VF then 

(6.1) ill/llL(,)<ll/lli.'<ill/llL(,)- 

Clearly, the condition on e holds ii c^l A/B, and since c > 1 it follows that 
k'>k. 

Consider the set U = absconv(i/). By the Majorizing Measures theorem 
and a simple application of Theorem 4.2, with probability at least 1 — 6 for 
|cj| =fc', 

(6.2) e*{PaU) < C2-f2{PaU, | • |) < C3 (L, ,5) A^/F. 

Let a = (Xi)^^-^ be in the intersection of the two events given by (6.1) and 
(6.2), set V = P„U and note that S(Lf ) = y/¥Bl^ . Therefore, 



k<\ogN[V, -Vk'B^ \ = log N{V,c^B^ ) 
(6.3) 

= logN{V,8pBl), 

where 04^ A (and thus p^L A as well) . \iVp = V r\ /O-Bf ' ' then by Lemma 
6.6, (6.2) and (6.3), 

2 / nr \\ 2 



k<2[^^^] \og{&F{p))<2[^^^] log 



P J ' \ P J \UVp) 

Solving this inequality for i^:(Vp), it is evident that there exists a constant 
C5 ~L,5 -B/\/log(c6^^/i?^) (where cg depends on L and S) for which £*(lp) > 
c^\fk! . Since F is convex and symmetric and H C F, then 

Vp = P,(absconv(F)) n pB^' C P. f { / G F : ||/||^,/ < 



^2 



and by (6.1) 



Vk' 

Hence, there is a constant cy A for which with probability at least 
1 — 6 — 2exp(— ci/c). 



^PCP.(|/GF:||/||^,(,)<-|=} 
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implying that 



c^vk' < Eg sup 

{/eF:i|/||L2(M)<C7/v^} 



i=l 



□ 



The next step in the proof of Theorem 6.4 is a second locahzation argu- 
ment. Theorem 6.9 shows that under our assumptions, there is a small ball 
(of radius ^l/\fk) in F that causes coordinate projections of F of dimen- 
sion k to be "rich." Now, one has to localize even further by truncating the 
functions in Fi = F D {c/y/k)B{L2{l^L)). 

Definition 6.10. For every /3 > and every f F ^ let 
= /l{|/l</3} + sgn(/)/3l{|/|>/3} 
and f+ = f - ffj. For every = (Xi, . . . , Xk) let 

y- = {(/- {X,))U --f^F}, V; = : / e F}. 

Proof of Theorem 6.4. First, by Theorem 6.9, with probability at 
least 1 — 5 — 2exp(— ci/c), 



Eg sup 

feFnc2/VkB{L2{t^)) 



ck 

E 

1=1 



where c> 1 . Set 



C2 



H = Fr\-j=B{L2{tx)) 



and note that by the proof of Theorem 4.2 for the class H and m = ck, 
each h £ H can be written as h = hi + h2, where hi £ H — H C 2H (by the 
convexity and symmetry of F), and /12 G F[. Moreover, if we write H C Hi + 
H2 then with fi"'' probability 1-6, P^Hi C C472,r,fe (-^, -^^2)l^cfc C CiAWck, 
where C4 = C4{L,5). By a standard concentration argument — similar to the 
one used in Theorem 4.2, since \H2\ < exp(c5A;) then for every /12 € H2, 
\\h2\\L^k < C6II/12IIL2 < cj/Vk. Thus, PaH2 C {cT/Vk)Bf' , and since iJf'^ C 
Wcfc, then 

P^H C P<,//i + PaH2 C cgt^cfc, 

where cg = C8(A, 5). 

Let a = (Xj)?^^ for which the above estimates hold, fix (3 to be named 
later and let and be as in Definition 6.10 for the set H. Consider 
the set 



/3 



{x £ M^^x* < (cs/Vi) - /? for f < (c8//3)^< = for i > (cg//?)^} 
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and observe that C VF^. Therefore, if we set ip = (cg//?)^ and select /3 
to satisfy that 1 <ck then, 



E sup 



ck 



1=1 



<E sup 



ck 



i=l 



4 = 1 



< cg^Ji|3\og{ek/i|3) < 



for an appropriate choice of /3 ~ C'ij^fk. Since V = P^H C + Vg , then 



C3\/fc < Esup 



ck 



< E sup 



< E sup 



=1 

ck 



4 = 1 

ck 
i=l 



+ E sup 



i=l 



Therefore, 



E sup 



ck 
i=l 



> — > Ciofc. 



2/3 



Note that 



r'f{Xi)ei+ sgn(/(XO)ei:/GF|cS. 

{i:|/(^.)l</3} {i:\f{X,)\>p} ^ 

Therefore, by the optimal estimate in the sign-embedding theorem [12], there 
are constants cn ~ cfg and ci2 ~ cio such that 

YC{/3-%,cu)>ci2k. 

In other words, there is a set / C {1, . . . , ck}, \I\ > Ci2k and a vector {si)i£i 
such that for every J C I, there is vj G for which 

vjii)>Si + /3cii iiieJ, 

vj{i) <Si- Pcu if i G / \ J, 

and it is standard to verify that (sj)i67 C (3B^. It remains to show that 
{Xi)i£i is cii/3-shattered by F itself. To that end, fix any J C I, and let 
fj £ F he the function for which 

vj= Yl fjiXi)ei+ Y /3-sgn(/j(X,))e,. 



ck 
oo ■ 



{^■■\fj{x^)\m 



{i:\fj{X,)\>f5} 
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Observe that {I\J)n{i:sgn{vj{i)) > 0} C {i:\fj{Xi)\ </3}. Indeed, if there 
were some i G / \ J for which sgn{vj{i)) > and \ fj{Xi)\ > /3, then on one 
hand, vj{i) = /3, but on the other, vj{i) < Sj — f3cu < /?(! — cn) < (3, which is 
impossible. In a similar fashion, Jn {i:sgn{vj{i)) < 0} C {i : \fj{Xi)\ < f3}. 
Finally, fix i G J. If / and sgn(^;j(i)) > then fj{Xi) > vj{i) > 

Si + /3cii. Otherwise, sgn{vj{i)) < 0, implying that vj{i) = fj{Xi). Hence, 
for every i £ J, 

fj{Xi)>s, + ^cn, 
and by the same argument, for every i £ I\J, 

fj{Xi)< Si- Pdi. 
Therefore, VC(F, ^cn) = VC(F, ci'i/Vk) > cuk, as claimed. □ 
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